Variable selection bias in regression trees with constant fits

نویسندگان

Yu-Shan Shih

Hsin-Wen Tsai

چکیده

The greedy search approach to variable selection in regression trees with constant fits is considered. At each node, the method usually compares the maximally selected statistic associated with each variable and selects the variable with the largest value to form the split. This method is shown to have selection bias, if predictor variables have different numbers of missing values and the bias can be corrected by comparing the corresponding P -values instead. Methods related to some change-point problems are used to compute the P -values and their performances are studied. keyword : change-point; maximally selected statistic; missing values; P -values

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Visualizable and Interpretable Regression Models With Good Prediction Power

Many methods can fit models with higher prediction accuracy, on average, than least squares linear regression. But the models, including linear regression, are typically impossible to interpret or visualize. We describe a tree-structured method that fits a simple but non-trivial model to each partition of the variable space. This ensures that each piece of the fitted regression function can be ...

متن کامل

Visualizable and Interpretable Regression Models With Good Prediction Power1

متن کامل

Regression Trees With Unbiased Variable Selection and Interaction Detection

We propose an algorithm for regression tree construction called GUIDE. It is specifically designed to eliminate variable selection bias, a problem that can undermine the reliability of inferences from a tree structure. GUIDE controls bias by employing chi-square analysis of residuals and bootstrap calibration of significance probabilities. This approach allows fast computation speed, natural ex...

متن کامل

Variable Selection Bias in Classification Trees Based on Imprecise Probabilities

Classification trees are a popular statistical tool with multiple applications. Recent advancements of traditional classification trees, such as the approach of classification trees based on imprecise probabilities by Abellán and Moral (2005), effectively address their tendency to overfitting. However, another flaw inherent in traditional classification trees is not eliminated by the imprecise ...

متن کامل

Variable Selection in Classification Trees Based on Imprecise Probabilities

Classification trees are a popular statistical tool with multiple applications. Recent advancements of traditional classification trees, such as the approach of classification trees based on imprecise probabilities by Abellán and Moral (2004), effectively address their tendency to overfitting. However, another flaw inherent in traditional classification trees is not eliminated by the imprecise ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

Computational Statistics & Data Analysis

دوره 45 شماره

صفحات -

تاریخ انتشار 2004

Variable selection bias in regression trees with constant fits

نویسندگان

چکیده

منابع مشابه

Visualizable and Interpretable Regression Models With Good Prediction Power

Visualizable and Interpretable Regression Models With Good Prediction Power1

Regression Trees With Unbiased Variable Selection and Interaction Detection

Variable Selection Bias in Classification Trees Based on Imprecise Probabilities

Variable Selection in Classification Trees Based on Imprecise Probabilities

عنوان ژورنال:

اشتراک گذاری